perm filename TPX.TEX[TEX,DEK] blob sn#841370 filedate 1987-06-12 generic text, type T, neo UTF8
% exercises for TeX: The Program

\hsize=6.5in \vsize=8.75in % TUGboat (cleared by BB, Jan 87)
\font\tenbxsl=cmbxsl10 % font for slanted copy in title

% the following macros were used to make all the class handouts
\def\\#1{\hbox{\it#1\/\kern.05em}} % italic type for identifiers
\font\tentex=cmtex10 % TeX extended character set (used in strings)
\hyphenchar\tentt=-1 \hyphenchar\tentex=-1
\def\.#1{\hbox{\tentex % typewriter type for strings
  \let\'=\RQ % right quote in a string
  \let\`=\LQ % left quote in a string
  #1}}
\def\LQ{{\tt\char'22}} % left quote in a string
\def\RQ{{\tt\char'23}} % right quote in a string
\def\O#1{\hbox{\rm\char'23\kern-.2em\it#1\/\kern.05em}} % octal constant

\def\begintt{$$\ttverbatim \catcode`\|=0 \parskip=0pt \ttfinish}
\chardef\other=12
\def\ttverbatim{\begingroup
  \catcode`\\=\other \catcode`\{=\other \catcode`\}=\other
  \catcode`\$=\other \catcode`\&=\other \catcode`\#=\other
  \catcode`\%=\other \catcode`\~=\other \catcode`\_=\other
  \catcode`\↑=\other
  \obeyspaces \obeylines \tt}
{\obeyspaces\gdef {\ }}
{\catcode`\|=0 |catcode`|\=\other % | is temporary escape character
  |obeylines % end of line is active
  |gdef|ttfinish#1↑↑M#2\endtt{#1|vbox{#2}|endgroup$$}}
\def\|{{\tt\char`\|}}
\catcode`\|=\active \def|{\ttverbatim\let|=\endgroup}

\def\prob#1. {\medbreak\noindent\hbox to\parindent{\bf #1.\hfil}}

\begingroup
\hsize=3.18in
\tracingparagraphs=1 \pretolerance=-1
\prob 30. When your instructor made up this problem, he said
`|\tracingparagraphs=1|' so that his transcript file would explain why
\TeX\ has broken the paragraph\preak into lines in a particular way. He also said\break
`|\pretolerance=-1|' so that hyphenation would be tried immediately.
The output is shown on the next page; use it to determine
what line breaks would have\break been found by a simpler algorithm
that breaks one line at a time. (The simpler algorithm finds the
breakpoint that yields fewest demerits on the first line,
then chooses it and starts over again.)

\endgroup

\bye
\pageinsert 
\vbox to\vsize{(here I'll put the paragraph trace stuff) \vfill}
\endinsert

\prob 30. Play through the algorithms in parts 42 and 43, to figure
out the contents of \\{trie\_op}, \\{trie\_char}, \\{trie\_link},
\\{hyf\_distance}, \\{hyf\_num}, and \\{hyf\_next} after the statement
\begintt
\patterns{a1bc 2bcd3 ab1cd}
\endtt
has been processed. Then execute the algorithm of \S923, to see
how \TeX\ uses this efficient trie structure
to set the values of \\{hyf} when the word |aabcd| is hyphenated.
[The value of \\{hn} will be~5, and the values of \\{hc}[$1\,.\,.\,5$]
will be $(96,96,97,98,99)$, respectively, when \S923 begins.]

\prob 31. The \\{save\_stack} is normally empty when
a \TeX\ program stops. But if,
say, the user's input has an extra `|{|' (or a missing `|}|'), \TeX\
will print the warning message
\begintt
(\end occurred inside a group at level 1)
\endtt
(see \S1335).

Explain in detail how to change \TeX\ so that such warning messages will be
more explicit. For example, if the source program has an unmatched `|{|'
on line~6 and an unmatched `|\begingroup|' on line~25, your modified \TeX\
should give two warnings:
\begintt
(\end occurred when \begingroup on line 25 was incomplete)
(\end occurred when { on line 6 was incomplete)
\endtt
You may assume that \\{simple\_group} and \\{semi\_simple\_group} are the
only group codes present on \\{save\_stack} when \S1335 is encountered;
if other group codes are present, your program should call \\{confusion}.

\prob 32. (The following question is the most difficult yet most important
of the entire collection. It was the main problem on the take-home final exam.)

\def\TeXX{\TeX\kern-.3emX}
The purpose of this problem is to extend \TeX\ so that it will sell better
in China and Japan. The extended program, called \TeXX, allows each font
to contain up to 65536 characters. Each extended character is represented
by two values, its `extension'~$x$ and its `code'~$c$, where both $x$ and~$c$
lie between 0 and~255 inclusive. Characters with the same `$c$' but different
`$x$' correspond to different graphics; but they have the same width, height,
depth, and italic correction.

\TeXX\ is identical to \TeX\ except that it has one new primitive command:
|\xchar|. If\/ |\xchar| occurs in vertical mode, it begins a new paragraph;
i.e., it's a $\langle$horizontal command$\rangle$ as on p.~283 of {\sl The
\TeX book}. If\/ |\xchar| occurs in horizontal mode it should be followed
by a $\langle$number$\rangle$ between 0 and 65535; this number can be converted
to the form $256x+c$, where $0\le x,c<256$. The corresponding extended character
from the current font will be appended to the current horizontal list, and the
space factor will be set to 1000.
(If $x=0$, the effect of\/ |\xchar| is something like the effect of\/ |\char|,
except that |\xchar| disables ligatures and kerns and it doesn't do anything
special to the space factor. Moreover, no penalty is inserted
after an |\xchar| that happens to be the |\hyphenchar| of the current font.)
A word containing an extended character will not be hyphenated.
The |\xchar| command should not occur in math mode.

Inside \TeXX, an extended character $(x,c)$ in font~$f$ is represented by
two consecutive \\{char\_node} items $p$ and~$q$, where we have
$\\{font}(p)=\\{null\_font}$, $\\{character}(p)=\\{qi}(x)$, $\\{link}(p)=q$,
$\\{font}(q)=f$, and $\\{character}(q)=\\{qi}(c)$. This two-word representation
is used even when $x=0$.

\TeXX\ typesets an extended character by specifying character number $256x+c$
in the |DVI| file. (See the \\{set2} command in \S585.)

If \TeXX\ is run with the macros of plain \TeX, and if the user types
`|\tracingall| |\xchar600| |\showlists|', the output of \TeXX\ will
include
\begintt
{\xchar}
{horizontal mode: \xchar}
{\showlists}
|noindent|null
### horizontal mode entered at line 0
\hbox(0.0+0.0)x20.0
\tenrm \xchar"258
spacefactor 1000
\endtt
(since 600 is |"258| in hexadecimal notation).

Your job is to explain in detail {\it all\/} changes to \TeX\ that are
necessary to convert it to \TeXX.

[Note: A properly designed extension would also include the primitive operator
|\xchardef|, analogous to |\chardef| and |\mathdef|, because a language
should be `orthogonally complete'.
However, this additional extension has not been included as part
of problem~32, because it presents no special difficulties. Anybody who
can figure out how to implement |\xchar| can certainly also handle |\xchardef|.]

\prob 33. The first edition of {\sl \TeX: The Program\/} suggested that
extended characters could be represented with the following convention:
The first of two consecutive \\{char\_node} items was to contain the font
code and a character code from which the dimensions could be computed
as usual; the second \\{char\_node} was a \\{halfword} giving the actual
character number to be typeset. Fonts were divided into two types,
based on characteristics of their |TFM| headers; `oriental' fonts always
used this two-word representation, other fonts always used the one-word
representation.

Explain why the method suggested in problem 32 is better than this.
(There are at least two reasons.)

\bye